Biological Pattern Discovery with R Machine Learning Approaches (Zheng Rong Yang)

364

Chapter 8

ptimised Peptide Pattern Discovery

hen analysing the patterns of protease cleavage sites or

sttranslational modification sites based on peptide data, a

y question is whether it is possible to discover interpretable

d explainable as well visible rules by which how peptides

classified can be well-understood. A linear model benefits

etter interpretation between experimental data sets such as

ptides and peptide labels. However, the relationship

tween peptides used in either protease cleavage pattern

covery or posttranslational modification pattern discovery

d peptide labels may not always be simple. Moreover,

ptides are non-numerical data. On the other hand, most

nlinear models such as neural network models do not offer

fficient insight into data. The decision-tree algorithms or

random forest algorithms are capable of providing a better

erpretation to a model. However, in order to discover the

timal models, an expensive exhaustive enumeration has to

considered. This is why the evolutionary computation

proaches have provided a better way and have been well-

mployed in many areas for generating optimal or near

timal models with a better interpretation capability. This

apter will introduce a different type of machine learning

proaches for this kind of biological pattern discovery. It is

genetic programming algorithm, which is one type of the

olutionary computation approaches. This chapter will

roduce how the genetic programming algorithm can be

ed for discovering the interpretable rules for a peptide data